suppressPackageStartupMessages(library(tidyverse))
# devtools::install_github("dgrtwo/gganimate")
# install.packages("cowplot")
library(gganimate)
# install image magick in terminal >> "brew install image magick"

Communicating uncertainty is hard. It’s hard because uncertainty can be convoluted and opaque. And it’s hard because a lot of people can’t get down with boxplots or measures of spread.

Take this toy data for example:

set.seed(2017)

df <- tibble(
    Seinfeld = round(rnorm(50, mean = 85, sd = 10)), 
    `Rick and Morty` = round(rnorm(50, mean = 90, sd = 5))
    ) %>% 
    mutate(episode = row_number()) %>% 
    gather(tv_show, rating, -episode) %>% 
    mutate(rating = ifelse(rating >= 100, 100, rating) / 100)

Here we have the (fake) ratings for 50 episodes of two TV shows, Rick and Morty and Seinfeld.

We have to convince the world that one of these shows is better. If we didn’t exactly know the distributions of this data, there would be some uncertainty in our answer.

To prove out which show is better we might push the data through a boxplot:

df %>% 
    ggplot(aes(x = tv_show, y = rating)) +
    geom_boxplot()
unnamed-chunk-4

unnamed-chunk-4

Or a density chart:

df %>% 
    ggplot(aes(x = rating, fill = tv_show)) + 
    geom_density(alpha = 1/2)
unnamed-chunk-5

unnamed-chunk-5

Or use some errorbars:

df %>% 
    group_by(tv_show) %>% 
    summarise(
        mean = mean(rating), 
        low = quantile(rating, 0.025),
        high = quantile(rating, 0.975)) %>% 
    ggplot(aes(x = tv_show, y = mean)) +
    geom_errorbar(aes(ymin = low, ymax = high))
unnamed-chunk-6

unnamed-chunk-6

Or we might even go HAM and build some model on it.

But these options all kind of suck. They’re not super intuitive. And they aren’t all that convincing, because they intimidate a lot of people!

Instead of boxplots or density charts or regular errorbars we can hack errorbars to generate a proto-Hypothetical Outcome Plot:

df %>% 
    ggplot(aes(x = tv_show, y = rating)) +
    geom_errorbar(aes(ymin = rating, ymax = rating))
unnamed-chunk-7

unnamed-chunk-7

Hypothetical Outcome Plots (HOPs) are a way to build and visualize uncertainty in the same way that we experience it (in and by countable events). The depth and theory behind HOPs is beyond the scope of this quick post, but if you’re interested in learning mor check out this Medium story by the UW Interactive Data Lab.

Extending our hacked errorbars with gganimate we can implement a HOP in just a few lines of R:

p <- df %>% 
    ggplot(aes(x = tv_show, y = rating, frame = episode)) +
    geom_errorbar(aes(ymin = rating, ymax = rating))

gganimate(p, title_frame = FALSE)
unnamed-chunk-8

unnamed-chunk-8

Cool. Looks like Seinfeld is super consistent and r Rick & Morty

We can get a little fancy and add in some ghosting:

p <- df %>%
    ggplot(aes(x = tv_show, y = rating)) +
    geom_errorbar(aes(
        ymin = rating, ymax = rating, 
        frame = episode, cumulative = TRUE), 
        color = "grey80", alpha = 1/8) +
    geom_errorbar(aes(
        ymin = rating, ymax = rating, frame = episode), 
        color = "#00a9e0") +
    scale_y_continuous(
        limits = c(0, 1), 
        labels = scales::percent_format()) +
    theme(panel.background = element_rect(fill = "#FFFFFF")) +
    labs(title = "TV Shows", y = "Rating", x = "")

gganimate(p, title_frame = FALSE)
unnamed-chunk-9

unnamed-chunk-9

And some colour

df_col <- df %>%
    spread(tv_show, rating) %>% 
    mutate(better = ifelse(
        `Rick and Morty` >= Seinfeld, "Rick and Morty", "Seinfeld")) %>%
    gather(tv_show, rating, -episode, -better) %>%
    mutate(col = ifelse(better == tv_show, "green", "red"))

p <- df_col %>%
    ggplot(aes(x = tv_show, y = rating)) +
    geom_errorbar(aes(
        ymin = rating, ymax = rating, 
        frame = episode, cumulative = TRUE), 
        color = "grey80", alpha = 1/8) +
    geom_errorbar(aes(
        ymin = rating, ymax = rating, 
        color = col, frame = episode), 
        size = 1.5, alpha = 1/2, show.legend = FALSE) +
    scale_color_manual(values = c("#8FB339", "#FF1B1C")) +
    scale_y_continuous(
        limits = c(0, 1), 
        labels = scales::percent_format()) +
    theme(panel.background = element_rect(fill = "#FFFFFF")) +
    labs(title = "TV Shows", y = "Rating", x = "")

gganimate(p, title_frame = FALSE)
unnamed-chunk-10

unnamed-chunk-10